Search Results: "kirk"

Until now qcontrol has mostly only supported only ARM (kirkwood) based devices (upstream has a configuration example for the HP Media Vault too, but I don't know if it is used). Debian bug #712191 asked for at least some basic support for x86 based devices. The mostly don't use the QNAP PIC used on the ARM devices so much of the qcontrol functionality is irrelevant but at least some of them do have a compatible A125 LCD. Unfortunately I don't have any x86 QNAP devices and I've been unable to figure out a way to detect that we are running on an QNAP box as opposed to any random x86 box so I've not been able to implement the hardware auto-detection used on ARM to configure qcontrol for the appropriate device at installation time. I don't want to include a default configuration which tries to drive an LCD on some random serial port since I have no way of knowing what will be on the other end or what the device might do if sent random bytes of the LCD control protocol. So I've implemented debconf prompting for the device type which is used only if auto-detection fails, so it shouldn't change anything for existing users on ARM. You can find this in version 0.5.2-3~exp1 in experimental (see DebianExperimental on the Debian wiki for how to use experimental). Currently the package only knows about the existing set of ARM platforms and a default "unknown" platform, which has an empty configuration. If you have a QNAP device (ARM or x86) which is not currently supported then please install the package from experimental and tailor /etc/qcontrol.conf for you platform (e.g. by uncommenting the a125 support and giving it the correct serial port). Then send me the result along with the device's name. If the device is an ARM one please also send me the contents of /proc/cpuinfo too so I can implement auto-detection. If you know how to detect a particular x86 QNAP device programmatically (via DMI decoding, PCI probing, sysfs etc, but make sure it is 100% safe on non-QNAP platforms) then please do let me know.

Dustin Kirkland recently wrote that "Fingerprints are usernames, not passwords". I don't really agree, I think fingerprints are fine for lightweight authentication. iOS at least allows you to only require a pass code after a time period has expired, so you don't have to authenticate to the phone all the time. Replacing no authentication with weak authentication (but only for a fairly short period) will improve security over the current status, even if it's not perfect. Having something similar for Linux would also be reasonable, I think. Allow authentication with a fingerprint if I've only been gone for lunch (or maybe just for a trip to the loo), but require password or token if I've been gone for longer. There's a balance to be struck between convenience and security.

Debian 7.0 (wheezy) has been released. Here are some notes if you're running Debian on an ARM-based NAS device or plug computer and are planning to upgrade. First of all, if you're running Debian on a plug computer, such as the SheevaPlug, make sure that you have u-boot version 2011.12-3 (or higher). If you're using an older version, the Linux kernel in wheezy will not boot! You can read my u-boot upgrade instructions on how to check the version of u-boot and upgrade it. Second, check your /etc/kernel-img.conf file. If it still contains the following line, please remove this line.

postinst_hook = flash-kernel

This postinst_hook directive was needed in the past but flash-kernel is called automatically nowadays whenever you install a new kernel. Now you're almost ready to start with your upgrade. Before you start, make sure to read the release notes for Debian 7.0 on ARM. This document contains a lot of information on performing a successful upgrade. During the kernel upgrade, you'll get the following message about the boot loader configuration:

The boot loader configuration for this system was not recognized. These
settings in the configuration may need to be updated:
 * The root device ID passed as a kernel parameter;
 * The boot device ID used to install and update the boot loader.

On ARM-based NAS devices and plug computers, you can simply ignore this warning. We put the root device into the ramdisk so it will be updated automatically. There are no other issues I'm aware of, so good luck with your upgrade and have fun with Debian wheezy!

The Debian package of the Linux kernel is based on Linux 3.2, but has some additional features. This continues from parts 1, 2 and 3, and covers new and improved hardware support. DRM drivers from Linux 3.4 (proposed) Some recent Intel and AMD graphics chips were not supported well or at all by the DRM (Direct Rendering Manager) drivers in Linux 3.2. Although many bug fixes have been included in 3.2.y stable updates, we are considering updating them to the versions found in Linux 3.4. (We did something similar for Debian 6.0 'squeeze'.) Julien Cristau has been working on this and has prepared binary packages. To install, run as root:

gpg --no-default-keyring --keyring /usr/share/keyrings/debian-keyring.gpg --export 310180050905E40C   apt-key add -
echo deb http://people.debian.org/~jcristau/wheezy-drm34/ ./ > /etc/apt/sources.list.d/jcristau-wheezy-drm34.list
apt-get update
apt-get install linux-image-3.2.0-4.drm-amd64  # or -486, or -686-pae

Please test these and report your results to bug #687442. I would suggest testing suspend/resume (if applicable), use of internal and external displays on laptops, and 3D accelerated graphics (games and compositing window managers such as GNOME Shell). Remember that AMD/ATI graphics chips require the firmware from the firmware-linux-nonfree package for 3D acceleration and many other features. amilo-rfkill driver I wrote a standard rfkill driver for the Fujitsu-Siemens Amilo A1655 and M7440, based on the out-of-tree fsaa1655g and fsam7440 drivers. Unlike those, it should work with the rfkill command, Network Manager, etc. ALPS touchpads Newer touchpads made by ALPS use different protocols for reporting scroll and pinch gestures. Jonathan Nieder backported the changes to support these. Wacom tablets Jonathan Nieder updated the wacom driver to the version in Linux 3.5, adding support for the Intuos5, Bamboo Connect, Bamboo 16FG and various other models. Emulex OneConnect 'Skyhawk' Sarveshwar Bandi at Emulex contributed a backport of the be2net driver from Linux 3.5, adding support for their 'Skyhawk' chip. Marvell Kirkwood Marvell's Kirkwood SoCs have been supported upstream for some time and in Debian since release 6.0 'squeeze'. However new models and boards generally require specific support. Arnaud Patard backported support for the 6282 rev A1 chip found in QNAP TS-x19P II models, and for the Marvell Dreamplug and Iomega Iconnect. Miscellaneous

ITE IT8728F hardware monitor
Ralink RT5392, RT5390R and RF5372 wifi chips
Synaptics USB touchpad devices
JMicron JM362 SATA controller
IguanaWorks USB IR transceiver

More to come Missing hardware support is an important bug that can be fixed by kernel updates during a freeze and throughout the lifetime of a stable release. If you know that new hardware isn't supported by the Debian kernel, please open a bug report. I can't promise that it will be fixed, but we need to know what's missing. Hardware vendors that maintain their own drivers upstream (not out-of-tree) are especially welcome to contribute tested backports that add support for the latest hardware. Send mail to debian-kernel@lists.debian.org if you have any questions about this.

The QNAP TS-XXX NAS boxes are nice little systems but one thing I really miss is a console. Martin Michlmayr has some instructions (TS-119, TS-219, TS-41x) on his excellent Debian On QNAP pages on how to build a suitable adaptor but sadly even though I've managed to get all the parts I've been too lazy to actually solder the thing together (it doesn't help that I nearly always burn myself when I use a soldering iron!). This became more pressing when someone reported that Debian bug #693263 in qcontrol was not fixed in Wheezy since the issue appeared to be in the initramfs hook. Worse it seemed like something in my proposed fix was causing the system to not boot at all! Having finally found the hardware needed to reproduce the issue my first thought was to try netconsole. To do this I edited /etc/initramfs-tools/modules to add:

      mv643xx_eth
      netconsole netconsole=@<IP>/eth0,@<DST-IP>/<DST-MAC>

The option syntax is described in netconsole.txt but briefly: <IP> is the address of the TS-419P II I'm debugging on, and <DST-IP> and <DST-MAC> are the IP and MAC address of another machine on the network. Having done that, running update-initramfs -u and rebooting I can use netcat -u -l -p 6666 on the other machine to see all the kernel messages. So far so good but this doesn't get me any debugging from the userspace portions of the initramfs. To get those we have to get a bit hacky by editing /usr/share/initramfs-tools/init, first to change:


     # Parse command line options
    -for x in $(cat /proc/cmdline); do
    +for x in $(cat /proc/cmdline) debug; do
            case $x in

and secondly:


             debug)
                     debug=y
                     quiet=n
    -                exec >/run/initramfs/initramfs.debug 2>&1
    +                exec >/dev/kmsg 2>&1
                     set -x
                     ;;

The first of these simulates adding debug to the kernel command line (which can't otherwise easily be edited on these systems) and the second redirects the initramfs process's output to the kernel log. The overall effect is that the output of the initramfs processes appears over netcat.

This is slightly old news now, but I've decided to starting blogging and this seems like a good place to start. At this year's debconf in Nicaragua I finally got around to applying to be a Debian Maintainer. Thanks to the various people who advocated me (and prodded me to apply in the first place!). My key was added to the DM keyring in mid-August. I intend to continue to maintain the various IVTV capture card packages that I maintain (ivtv-utils and xserver-xorg-video-ivtv) as well as continuing my involvement with the packaging of Xen and associated stuff like the kernel and installer. I also plan keep working on improved support for the DreamPlug, for which I have a few fixes queued up for post-Wheezy already. More recently I've bought a QNAP TS-210 (which is a nice little NAS that can run Debian) and ended up adopting the qcontrol package, which ended up being my first actual upload to Debian! Anyway, hopefully I'll manage to post here at least semi-regularly.

I have a Seagate GoFlex Net with two 2TB harddrives attached to it via SATA. The device itself is connected to my PC via its Gigabit Ethernet connection. It houses a Marvell Kirkwood at 1.2GHz and 128MB. I am booting Debian from a USB stick connected to its USB 2.0 port. The specs are pretty neat so I planned it as my NAS with 4TB of storage being attached to it. The most common use case is the transfer of big files (1-10 GB) between my laptop and the device. Now what are the common ways to achieve this? scp:

scp /local/path user@goflex:/remote/path

rsync:

rsync -Ph /local/path user@goflex:/remote/path

sshfs:

sshfs -o user@goflex:/remote/path /mnt
cp /local/path /mnt

ssh:

ssh user@goflex "cat > /remote/path" < /local/path

I then did some benchmarks to see how they perform: scp: 5.90 MB/s rsync: 5.16 MB/s sshfs: 5.05 MB/s ssh: 5.42 MB/s Since they all use ssh for transmission, the similarity of the result does not come as a surprise and 5.90 MB/s are also not too shabby for a plain scp. It means that I can transfer 1 GB in a bit under three minutes. I could live with that. Even for 10 GB files I would only have to wait for half an hour which is mostly okay since it is mostly known well in advance that a file is needed. But lets see if we can somehow get faster than this. Lets analyze where the bottleneck is. Lets have a look at the effective TCP transfer rate with netcat:

ssh user@goflex "netcat -l -p 8000 > /dev/null"
dd if=/dev/zero bs=10M count=1000   netcat goflex 8000

79.3 MB/s wow! Can we get more? Lets try increasing the buffer size on both ends. This can be done using nc6 with the -x argument on both sides.

ssh user@goflex "netcat -x -l -p 8000 > /dev/null"
dd if=/dev/zero bs=10M count=1000   netcat -x gloflex 8000

103 MB/s okay this is definitely NOT the bottleneck here. Lets see how fast I can read from my harddrive:

hdparm -tT /dev/sda

114.86 MB/s.. hmm... and writing to it?

ssh user@goflex "time sh -c 'dd if=/dev/zero of=/remote/path bs=10M count=100; sync'"

42.93 MB/s Those values are far faster than my puny 5.90 MB/s I get with scp. A look at the CPU usage during transfer shows, that the ssh process is at 100% CPU usage the whole time. It seems the bottleneck was found to be ssh and the encryption/decryption involved. I'm transferring directly from my laptop to the device. Not even a switch is in the middle so encryption seems to be quite pointless here. Even authentication doesnt seem to be necessary in this setup. So how to make the transfer unencrypted? The ssh protocol specifies a null cipher for not-encrypted connections. OpenSSH doesnt support this. Supposedly, adding

  "none", SSH_CIPHER_NONE, 8, 0, 0, EVP_enc_null

to cipher.c adds a null cipher but I didnt want to patch around in my installation. So lets see how a plain netcat performs.

ssh user@goflex "netcat -l -p 8000 > /remote/path"
netcat goflex 8000 < /local/path

32.9 MB/s This is far better! Lets try a bigger buffer:

ssh user@goflex "netcat -x -l -p 8000 > /remote/path"
netcat -x goflex 8000 < /local/path

37.8 MB/s now this is far better! My Gigabyte will now take under half a minute and my 10 GB file under five minutes. But it is tedious to copy multiple files or even a whole directory structure with netcat. There are far better tools for this. An obvious candidate that doesnt encrypt is rsync when being used with the rsync protocol.

rsync -Ph /local/path user@goflex::module/remote/path

30.96 MB/s which is already much better! I used the following line to have the rsync daemon being started by inetd:

rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon

But it is slower than pure netcat. If we want directory trees, then how about netcatting a tarball?

ssh user@goflex "netcat -x -l -p 8000   tar -C /remote/path -x"
tar -c /local/path   netcat goflex 8000

26.2 MB/s so tar seems to add quite the overhead. How about ftp then? For this test I installed vsftpd and achieved a speed of 30.13 MB/s. This compares well with rsync. I also tried out nfs. Not surprisingly, its transfer rate is up in par with rsync and ftp at 31.5 MB/s. So what did I learn? Lets make a table: </dr></dr></dr></dr></dr></dr></dr></dr></dr></dr>

method	speed in MB/s
scp	5.90
rsync+ssh	5.16
sshfs	5.05
ssh	5.42
netcat	32.9
netcat -x	37.8
netcat -x tar	26.2
rsync	30.96
ftp	30.13
nfs	31.5

For transfer of a directory structure or many small files, unencrypted rsync seems the way to go. It outperforms a copy over ssh more than five-fold. When the convenience of having the remote data mounted locally is needed, nfs outperforms sshfs at speeds similar to rsync and ftp. As rsync and nfs already provide good performance, I didnt look into a more convenient solution using ftp. My policy will now be to use rsync for partial file transfers and mount my remote files with nfs. For transfer of one huge file, netcat is faster. Especially with increased buffer sizes it is a quarter faster than without. But copying a file with netcat is tedious and hence I wrote a script that simplifies the whole remote-login, listen, send process to one command. First argument is the local file, second argument is the remote name and path just as in scp.

#!/bin/sh -e
HOST=$ 2%%:* 
USER=$ HOST%%@* 
if [ "$HOST" = "$2" -o "$USER" = "$HOST" ]; then
        echo "second argument is not of form user@host:path" >&2
        exit 1
fi
HOST=$ HOST#*@ 
LPATH=$1
LNAME= basename "$1" 
RPATH= printf %q $ 2#*: /$LNAME 
ssh "$USER@$HOST" "nc6 -x -l -p 8000 > $RPATH" &
sleep 1.5
pv "$LPATH"   nc6 -x "$HOST" 8000
wait $!
ssh "$USER@$HOST" "md5sum $RPATH" &
md5sum "$LPATH"
wait $!

I use pv to get a status of the transfer on my local machine and ssh to login to the remote machine and start netcat in listening mode. After the transfer I check the md5sum to be sure that everything went fine. This step can of course be left out but during testing it was useful. Escaping of the arguments is done with printf %q. Problems with the above are the sleep, which can not be avoided but must be there to give the remote some time to start netcat and listen. This is unclean. A next problem with the above is, that one has to specify a username. Another is, that in scp, one has to double-escape the argument while above this is not necessary. The host that it netcats to is the same as the host it ssh's to. This is not necessarily the case as one can specify an alias in ~/.ssh/config. Last but not least this only transfers from the local machine to the remote host. Doing it the other way round is of course possible in the same manner but then one must be able to tell how the local machine is reachable for the remote host. Due to all those inconveniences I decided not to expand on the above script. Plus, rsync and nfs seem to perform well enough for day to day use.

According to Eric S. Raymond, one rule for Unix programming is the Rule of Silence: When a program has nothing surprising to say, it should say nothing. Since return codes are a form of information, I would like to add: when a program has carried out its task with no surprise, it should exit(0).Take xscreensaver-command -lock, for instance. That command is meant to make a running XScreenSaver to lock the screen. I have never seen it fail in doing so, however I sometimes get a non-zero code:

% xscreensaver -lock
xscreensaver-command: could not frobnicate window 4212
zsh: exit 1   xscreensaver-command -lock

Well, xscreensaver-command, I have the deep regret to announce you that despite of your surprising return code, you successfully succeeded to lock my screen. But that return code is wrong and annoying. Wrong, because as a user, I do not care if xscreensaver-command -lock failed to frobnicate some window he would have loved to as long as it does its job and locks my screen. To the user's point of view, locking the screen is binary: either it is done and it is a success or it is not done and that is a failure. And it is annoying, because it makes the return code meaningless and prevents you from using it to do things like:

% # Lock screen and suspend to RAM (only if locking was successful)
% xscreensaver -lock && sudo pm-suspend

With xscreensaver-command sometimes returning a non-zero code with no apparent reason, you end with a locked screen and your computer still running, and have to unlock it and try again. This looks like this meaningless dialogue:

Scott to bridge.
Kirk here. Scotty, could you restore impulse power?
I'm sorry captain, I was not able to align the coaxial stabilizers, we just sustained too much damage here.
How long would it take to get that power back?
But you have it captain! Impulse power is working all right, it's just that
Good job Scotty, do you think will hold long enough?
Well, I'd say it should hold until our next scheduled revision if we take care, but captain, I could not align

So please, when you write a utility, design your return code for the user, not for you. If your utility has done the job, it should exit(0). What happened inside does may interest you as the developer but it does not matter to the user as long as the job is done, so it should not be reported to him, especially not with a misleading failure code.

When Marvell originally released the first plug computer, they created their own version of u-boot with support for their new devices. Unfortunately, this version of u-boot is fairly out of date nowadays compared to mainline u-boot and has several problems. Support for plug computers (such as SheevaPlug and GuruPlug) have been integrated into the mainline u-boot (also known as DENX u-boot) in the meantime and Clint Adams has packaged it for Debian. I finally found the time to test Clint's u-boot binary on my devices and have updated the SheevaPlug installation guide accordingly. If you're have installed Debian to a SheevaPlug according to my instructions, I suggest you upgrade. If you boot from a MMC/SD card, you should be aware that the mmcinit command has been renamed to mmc init in order to be consistent with the naming of other commands. You'll therefore have to update your bootcmd_mmc variable in u-boot like this:

setenv bootcmd_mmc 'mmc init; ext2load mmc 0:1 0x00800000 /uImage; ext2load mmc 0:1 0x01100000 /uInitrd'
saveenv

Last year the Psychologist and Baptist minister George Rekers who is famous for anti-homosexuality pseudo-science was discovered to be hiring gay escorts from Rentboy.com. Lots of LULZ there. But the story didn t end there. It turns out that George Rekers did some research on a child who ended up committing suicide as an adult, and the circumstantial evidence suggests that George s actions are directly related to the suicide [1]. The Rentboy.com affair doesn t seem so funny now. The Box Turtle Bulletin has a series of articles about Kirk Andrew Murphy s suicide and the roles of George Rekers and Richard Green in all of this [2], the articles are well written and generally appear to be well researched I recommend reading the articles if you can stomach them (lots of nasty stuff is described). The section answering the question of who s responsible for the mistreatment of Kirk Andrew Murphy [3] where they describe the use of ABA (AKA the Lovaas Technique) is interesting. Ivar Lovaas worked with George Rekers in such research and published a paper with him. The term ABA gets an immediate hostile reaction in the Autism community, but until now I hadn t realised why so many people hate it so much. It seems that to some extent I made the classic mistake of misjudging the reports of Autistic people who are unable to present their case well (as opposed to the psychologists who can present any position very well even if it s utterly insane). In the past I had the impression that ABA wasn t inherently bad, it was just implemented in a bad way in some cases now it seems that ABA was designed in an evil way right from the start. There is one massive problem with the Box Turtle analysis, he says Behavioral analysts don t dig around much into people s feelings, fears, dreams, family relationships or childhood memories. Indeed, in cases like autism, Lovaas s specialty, those avenues of exploration would be irrelevant . It could be that Jim Burroway (the Box Turtle writer) is merely quoting someone else without attribution, but even so saying that the feelings, fears, and dreams of a group of people are irrelevant is just awful, a statement that denies the humanity of a group of people can t be quoted without further explanation. In his article about ABA Jim refers to childhood Autism as a condition for which there is no hope for interior change [4]. I m not sure if he s just saying that Autistic children are incapable of learning or whether it s all Autistic people, in either case it s nonsense in terms of science and nasty as well. Generally I expect that members of various minority groups will show more sympathy to each other than they receive from the general population. Jim s posts are a great disappointment. I understand that he would be rather stressed about the horrible things that George Rekers et al did, but even so he should be able to avoid that sort of thing. Jim is obviously a very talented writer and can do better. One might think that Jim s posts use the word Autism to refer only to the people who are non-verbal (or in other ways less capable than the huge number of Autistic people who work for companies like Google and IBM). But that s no excuse either. You can find blogs and essays written by non-verbal Autistic people that describe their experiences if you care to search for them. It s obvious that they are people too and deserve to be treated as people not objects. Abusing Autistic children to try and make them impersonate NT children is no less evil than abusing children who don t fit gender norms.

I received a number of questions as to how the boot process of the SheevaPlug running Debian works. I've now published an explanation of how u-boot loads the Debian kernel and ramdisk in order to boot Debian.

Linux 2.6.36 was released today. This has been a very quiet release for the ASoC core code but one of the busiest releases for a while for new CPU support, with four new architectures added:

CODEC support for JZ4740, WM8741, and WM8987
CPU drivers for EP93xx, JZ4740, Marvell Kirkwood/Orion I2S, NUC900, and SH S/PDIF
Machine support for Eureka, OpenRD, qi_lb60, SmartQ, and Snapper CL15

Award winning code Me and Yuwei had a fun day at hhhmcr (#hhhmcr) and even managed to put together a prototype that won the first prize \o/ We played with the gmp24 dataset kindly extracted from Twitter by Michael Brunton-Spall of the Guardian into a convenient JSON dataset. The idea was to find ways of making it easier to look at the data and making sense of it. This is the story of what we did, including the code we wrote. The original dataset has several JSON files, so the first task was to put them all together:

#!/usr/bin/python
# Merge the JSON data
# (C) 2010 Enrico Zini <enrico@enricozini.org>
# License: WTFPL version 2 (http://sam.zoy.org/wtfpl/)
import simplejson
import os
res = []
for f in os.listdir("."):
    if not f.startswith("gmp24"): continue
    data = open(f).read().strip()
    if data == "[]": continue
    parsed = simplejson.loads(data)
    res.extend(parsed)
print simplejson.dumps(res)

The results however were not ordered by date, as GMP had to use several accounts to twit because Twitter was putting Greather Manchester Police into jail for generating too much traffic. There would be quite a bit to write about that, but let's stick to our work. Here is code to sort the JSON data by time:

#!/usr/bin/python
# Sort the JSON data
# (C) 2010 Enrico Zini <enrico@enricozini.org>
# License: WTFPL version 2 (http://sam.zoy.org/wtfpl/)
import simplejson
import sys
import datetime as dt
all_recs = simplejson.load(sys.stdin)
all_recs.sort(key=lambda x: dt.datetime.strptime(x["created_at"], "%a %b %d %H:%M:%S +0000 %Y"))
simplejson.dump(all_recs, sys.stdout)

I then wanted to play with Tf-idf for extracting the most important words of every tweet:

#!/usr/bin/python
# tfifd - Annotate JSON elements with Tf-idf extracted keywords
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import sys, math
import simplejson
import re
# Read all the twits
records = simplejson.load(sys.stdin)
# All the twits by ID
byid = dict(((x["id"], x) for x in records))
# Stopwords we ignore
stopwords = set(["by", "it", "and", "of", "in", "a", "to"])
# Tokenising engine
re_num = re.compile(r"^\d+$")
re_word = re.compile(r"(\w+)")
def tokenise(tweet):
    "Extract tokens from a tweet"
    for tok in tweet["text"].split():
        tok = tok.strip().lower()
        if re_num.match(tok): continue
        mo = re_word.match(tok)
        if not mo: continue
        if mo.group(1) in stopwords: continue
        yield mo.group(1)
# Extract tokens from tweets
tokenised = dict(((x["id"], list(tokenise(x))) for x in records))
# Aggregate token counts
aggregated =  
for d in byid.iterkeys():
    for t in tokenised[d]:
        if t in aggregated:
            aggregated[t] += 1
        else:
            aggregated[t] = 1
def tfidf(doc, tok):
    "Compute TFIDF score of a token in a document"
    return doc.count(tok) * math.log(float(len(byid)) / aggregated[tok])
# Annotate tweets with keywords
res = []
for name, tweet in byid.iteritems():
    doc = tokenised[name]
    keywords = sorted(set(doc), key=lambda tok: tfidf(doc, tok), reverse=True)[:5]
    tweet["keywords"] = keywords
    res.append(tweet)
simplejson.dump(res, sys.stdout)

I thought this was producing a nice summary of every tweet but nobody was particularly interested, so we moved on to adding categories to tweet. Thanks to Yuwei who put together some useful keyword sets, we managed to annotate each tweet with a place name (i.e. "Stockport"), a social place name (i.e. "pub", "bank") and a social category (i.e. "man", "woman", "landlord"...) The code is simple; the biggest work in it was the dictionary of keywords:

#!/usr/bin/python
# categorise - Annotate JSON elements with categories
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
# Copyright (C) 2010  Yuwei Lin <yuwei@ylin.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import sys, math
import simplejson
import re
# Electoral wards from http://en.wikipedia.org/wiki/List_of_electoral_wards_in_Greater_Manchester
placenames = ["Altrincham", "Sale West",
"Altrincham", "Ashton upon Mersey", "Bowdon", "Broadheath", "Hale Barns", "Hale Central", "St Mary", "Timperley", "Village",
"Ashton-under-Lyne",
"Ashton Hurst", "Ashton St Michael", "Ashton Waterloo", "Droylsden East", "Droylsden West", "Failsworth East", "Failsworth West", "St Peter",
"Blackley", "Broughton",
"Broughton", "Charlestown", "Cheetham", "Crumpsall", "Harpurhey", "Higher Blackley", "Kersal",
"Bolton North East",
"Astley Bridge", "Bradshaw", "Breightmet", "Bromley Cross", "Crompton", "Halliwell", "Tonge with the Haulgh",
"Bolton South East",
"Farnworth", "Great Lever", "Harper Green", "Hulton", "Kearsley", "Little Lever", "Darcy Lever", "Rumworth",
"Bolton West",
"Atherton", "Heaton", "Lostock", "Horwich", "Blackrod", "Horwich North East", "Smithills", "Westhoughton North", "Chew Moor", "Westhoughton South",
"Bury North",
"Church", "East", "Elton", "Moorside", "North Manor", "Ramsbottom", "Redvales", "Tottington",
"Bury South",
"Besses", "Holyrood", "Pilkington Park", "Radcliffe East", "Radcliffe North", "Radcliffe West", "St Mary", "Sedgley", "Unsworth",
"Cheadle",
"Bramhall North", "Bramhall South", "Cheadle", "Gatley", "Cheadle Hulme North", "Cheadle Hulme South", "Heald Green", "Stepping Hill",
"Denton", "Reddish",
"Audenshaw", "Denton North East", "Denton South", "Denton West", "Dukinfield", "Reddish North", "Reddish South",
"Hazel Grove",
"Bredbury", "Woodley", "Bredbury Green", "Romiley", "Hazel Grove", "Marple North", "Marple South", "Offerton",
"Heywood", "Middleton",
"Bamford", "Castleton", "East Middleton", "Hopwood Hall", "Norden", "North Heywood", "North Middleton", "South Middleton", "West Heywood", "West Middleton",
"Leigh",
"Astley Mosley Common", "Atherleigh", "Golborne", "Lowton West", "Leigh East", "Leigh South", "Leigh West", "Lowton East", "Tyldesley",
"Makerfield",
"Abram", "Ashton", "Bryn", "Hindley", "Hindley Green", "Orrell", "Winstanley", "Worsley Mesnes",
"Manchester Central",
"Ancoats", "Clayton", "Ardwick", "Bradford", "City Centre", "Hulme", "Miles Platting", "Newton Heath", "Moss Side", "Moston",
"Manchester", "Gorton",
"Fallowfield", "Gorton North", "Gorton South", "Levenshulme", "Longsight", "Rusholme", "Whalley Range",
"Manchester", "Withington",
"Burnage", "Chorlton", "Chorlton Park", "Didsbury East", "Didsbury West", "Old Moat", "Withington",
"Oldham East", "Saddleworth",
"Alexandra", "Crompton", "Saddleworth North", "Saddleworth South", "Saddleworth West", "Lees", "St James", "St Mary", "Shaw", "Waterhead",
"Oldham West", "Royton",
"Chadderton Central", "Chadderton North", "Chadderton South", "Coldhurst", "Hollinwood", "Medlock Vale", "Royton North", "Royton South", "Werneth",
"Rochdale",
"Balderstone", "Kirkholt", "Central Rochdale", "Healey", "Kingsway", "Littleborough Lakeside", "Milkstone", "Deeplish", "Milnrow", "Newhey", "Smallbridge", "Firgrove", "Spotland", "Falinge", "Wardle", "West Littleborough",
"Salford", "Eccles",
"Claremont", "Eccles", "Irwell Riverside", "Langworthy", "Ordsall", "Pendlebury", "Swinton North", "Swinton South", "Weaste", "Seedley",
"Stalybridge", "Hyde",
"Dukinfield Stalybridge", "Hyde Godley", "Hyde Newton", "Hyde Werneth", "Longdendale", "Mossley", "Stalybridge North", "Stalybridge South",
"Stockport",
"Brinnington", "Central", "Davenport", "Cale Green", "Edgeley", "Cheadle Heath", "Heatons North", "Heatons South", "Manor",
"Stretford", "Urmston",
"Bucklow-St Martins", "Clifford", "Davyhulme East", "Davyhulme West", "Flixton", "Gorse Hill", "Longford", "Stretford", "Urmston",
"Wigan",
"Aspull New Springs Whelley", "Douglas", "Ince", "Pemberton", "Shevington with Lower Ground", "Standish with Langtree", "Wigan Central", "Wigan West",
"Worsley", "Eccles South",
"Barton", "Boothstown", "Ellenbrook", "Cadishead", "Irlam", "Little Hulton", "Walkden North", "Walkden South", "Winton", "Worsley",
"Wythenshawe", "Sale East",
"Baguley", "Brooklands", "Northenden", "Priory", "Sale Moor", "Sharston", "Woodhouse Park"]
# Manual coding from Yuwei
placenames.extend(["City centre", "Tameside", "Oldham", "Bury", "Bolton",
"Trafford", "Pendleton", "New Moston", "Denton", "Eccles", "Leigh", "Benchill",
"Prestwich", "Sale", "Kearsley", ])
placenames.extend(["Trafford", "Bolton", "Stockport", "Levenshulme", "Gorton",
"Tameside", "Blackley", "City centre", "Airport", "South Manchester",
"Rochdale", "Chorlton", "Uppermill", "Castleton", "Stalybridge", "Ashton",
"Chadderton", "Bury", "Ancoats", "Whalley Range", "West Yorkshire",
"Fallowfield", "New Moston", "Denton", "Stretford", "Eccles", "Pendleton",
"Leigh", "Altrincham", "Sale", "Prestwich", "Kearsley", "Hulme", "Withington",
"Moss Side", "Milnrow", "outskirt of Manchester City Centre", "Newton Heath",
"Wythenshawe", "Mancunian Way", "M60", "A6", "Droylesden", "M56", "Timperley",
"Higher Ince", "Clayton", "Higher Blackley", "Lowton", "Droylsden",
"Partington", "Cheetham Hill", "Benchill", "Longsight", "Didsbury",
"Westhoughton"])
# Social categories from Yuwei
soccat = ["man", "woman", "men", "women", "youth", "teenager", "elderly",
"patient", "taxi driver", "neighbour", "male", "tenant", "landlord", "child",
"children", "immigrant", "female", "workmen", "boy", "girl", "foster parents",
"next of kin"]
for i in range(100):
    soccat.append("%d-year-old" % i)
    soccat.append("%d-years-old" % i)
# Types of social locations from Yuwei
socloc = ["car park", "park", "pub", "club", "shop", "premises", "bus stop",
"property", "credit card", "supermarket", "garden", "phone box", "theatre",
"toilet", "building site", "Crown court", "hard shoulder", "telephone kiosk",
"hotel", "restaurant", "cafe", "petrol station", "bank", "school",
"university"]
extras =   "placename": placenames, "soccat": soccat, "socloc": socloc  
# Normalise keyword lists
for k, v in extras.iteritems():
    # Remove duplicates
    v = list(set(v))
    # Sort by length
    v.sort(key=lambda x:len(x), reverse=True)
# Add keywords
def add_categories(tweet):
    text = tweet["text"].lower()
    for field, categories in extras.iteritems():
        for cat in categories:
            if cat.lower() in text:
                tweet[field] = cat
                break
    return tweet
# Read all the twits
records = (add_categories(x) for x in simplejson.load(sys.stdin))
simplejson.dump(list(records), sys.stdout)

All these scripts form a nice processing chain: each script takes a list of JSON records, adds some bit and passes it on. In order to see what we have so far, here is a simple script to convert the JSON twits to CSV so they can be viewed in a spreadsheet:

#!/usr/bin/python
# Convert the JSON twits to CSV
# (C) 2010 Enrico Zini <enrico@enricozini.org>
# License: WTFPL version 2 (http://sam.zoy.org/wtfpl/)
import simplejson
import sys
import csv
rows = ["id", "created_at", "text", "keywords", "placename"]
writer = csv.writer(sys.stdout)
for rec in simplejson.load(sys.stdin):
    rec["keywords"] = " ".join(rec["keywords"])
    rec["placename"] = rec.get("placename", "")
    writer.writerow([rec[row] for row in rows])

At this point we were coming up with lots of questions: "were there more reports on women or men?", "which place had most incidents?", "what were the incidents involving animals?"... Time to bring Xapian into play. This script reads all the JSON tweets and builds a Xapian index with them:

#!/usr/bin/python
# toxapian - Index JSON tweets in Xapian
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import simplejson
import sys
import os, os.path
import xapian
DBNAME = sys.argv[1]
db = xapian.WritableDatabase(DBNAME, xapian.DB_CREATE_OR_OPEN)
stemmer = xapian.Stem("english")
indexer = xapian.TermGenerator()
indexer.set_stemmer(stemmer)
indexer.set_database(db)
data = simplejson.load(sys.stdin)
for rec in data:
    doc = xapian.Document()
    doc.set_data(str(rec["id"]))
    indexer.set_document(doc)
    indexer.index_text_without_positions(rec["text"])
    # Index categories as categories
    if "placename" in rec:
        doc.add_boolean_term("XP" + rec["placename"].lower())
    if "soccat" in rec:
        doc.add_boolean_term("XS" + rec["soccat"].lower())
    if "socloc" in rec:
        doc.add_boolean_term("XL" + rec["socloc"].lower())
    db.add_document(doc)
db.flush()
# Also save the whole dataset so we know where to find it later if we want to
# show the details of an entry
simplejson.dump(data, open(os.path.join(DBNAME, "all.json"), "w"))

And this is a simple command line tool to query to the database:

#!/usr/bin/python
# xgrep - Command line tool to query the GMP24 tweet Xapian database
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import simplejson
import sys
import os, os.path
import xapian
DBNAME = sys.argv[1]
db = xapian.Database(DBNAME)
stem = xapian.Stem("english")
qp = xapian.QueryParser()
qp.set_default_op(xapian.Query.OP_AND)
qp.set_database(db)
qp.set_stemmer(stem)
qp.set_stemming_strategy(xapian.QueryParser.STEM_SOME)
qp.add_boolean_prefix("place", "XP")
qp.add_boolean_prefix("soc", "XS")
qp.add_boolean_prefix("loc", "XL")
query = qp.parse_query(sys.argv[2],
    xapian.QueryParser.FLAG_BOOLEAN  
    xapian.QueryParser.FLAG_LOVEHATE  
    xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE  
    xapian.QueryParser.FLAG_WILDCARD  
    xapian.QueryParser.FLAG_PURE_NOT  
    xapian.QueryParser.FLAG_SPELLING_CORRECTION  
    xapian.QueryParser.FLAG_AUTO_SYNONYMS)
enquire = xapian.Enquire(db)
enquire.set_query(query)
count = 40
matches = enquire.get_mset(0, count)
estimated = matches.get_matches_estimated()
print "%d/%d results" % (matches.size(), estimated)
data = dict((str(x["id"]), x) for x in simplejson.load(open(os.path.join(DBNAME, "all.json"))))
for m in matches:
    rec = data[m.document.get_data()]
    print rec["text"]
print "%d/%d results" % (matches.size(), matches.get_matches_estimated())
total = db.get_doccount()
estimated = matches.get_matches_estimated()
print "%d results over %d documents, %d%%" % (estimated, total, estimated * 100 / total)

Neat! Now that we have a proper index that supports all sort of cool things, like stemming, tag clouds, full text search with complex queries, lookup of similar documents, suggest keywords and so on, it was just fair to put together a web service to share it with other people at the event. It helped that I had already written similar code for apt-xapian-index and dde before. Here is the server, quickly built on bottle. The very last line starts the server and it is where you can configure the listening interface and port.

#!/usr/bin/python
# xserve - Make the GMP24 tweet Xapian database available on the web
#
# Copyright (C) 2010  Enrico Zini <enrico@enricozini.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import bottle
from bottle import route, post
from cStringIO import StringIO
import cPickle as pickle
import simplejson
import sys
import os, os.path
import xapian
import urllib
import math
bottle.debug(True)
DBNAME = sys.argv[1]
QUERYLOG = os.path.join(DBNAME, "queries.txt")
data = dict((str(x["id"]), x) for x in simplejson.load(open(os.path.join(DBNAME, "all.json"))))
prefixes =   "place": "XP", "soc": "XS", "loc": "XL"  
prefix_desc =   "place": "Place name", "soc": "Social category", "loc": "Social location"  
db = xapian.Database(DBNAME)
stem = xapian.Stem("english")
qp = xapian.QueryParser()
qp.set_default_op(xapian.Query.OP_AND)
qp.set_database(db)
qp.set_stemmer(stem)
qp.set_stemming_strategy(xapian.QueryParser.STEM_SOME)
for k, v in prefixes.iteritems():
    qp.add_boolean_prefix(k, v)
def make_query(qstring):
    return qp.parse_query(qstring,
        xapian.QueryParser.FLAG_BOOLEAN  
        xapian.QueryParser.FLAG_LOVEHATE  
        xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE  
        xapian.QueryParser.FLAG_WILDCARD  
        xapian.QueryParser.FLAG_PURE_NOT  
        xapian.QueryParser.FLAG_SPELLING_CORRECTION  
        xapian.QueryParser.FLAG_AUTO_SYNONYMS)
@route("/")
def index():
    query = urllib.unquote_plus(bottle.request.GET.get("q", ""))
    out = StringIO()
    print >>out, '''
<html>
<head>
<title>Query</title>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
<script type="text/javascript">
$(function() 
    $("#queryfield")[0].focus()
 )
</script>
</head>
<body>
<h1>Search</h1>
<form method="POST" action="/query">
Keywords: <input type="text" name="query" value="%s" id="queryfield">
<input type="submit">
<a href="http://xapian.org/docs/queryparser.html">Help</a>
</form>''' % query
    print >>out, '''
<p>Example: "car place:wigan"</p>

<p>Available prefixes:</p>

<ul>
'''
    for pfx in prefixes.keys():
        print >>out, "<li><a href='/catinfo/%s'>%s - %s</a></li>" % (pfx, pfx, prefix_desc[pfx])
    print >>out, '''
</ul>
'''
    oldqueries = []
    if os.path.exists(QUERYLOG):
        total = db.get_doccount()
        fd = open(QUERYLOG, "r")
        while True:
            try:
                q = pickle.load(fd)
            except EOFError:
                break
            oldqueries.append(q)
        fd.close()
        def print_query(q):
            count = q["count"]
            print >>out, "<li><a href='/query?query=%s'>%s (%d/%d %.2f%%)</a></li>" % (urllib.quote_plus(q["q"]), q["q"], count, total, count * 100.0 / total)
        print >>out, "<p>Last 10 queries:</p><ul>"
        for q in oldqueries[:-10:-1]:
            print_query(q)
        print >>out, "</ul>"
        # Remove duplicates
        oldqueries = dict(((x["q"], x) for x in oldqueries)).values()
        print >>out, "<table>"
        print >>out, "<tr><th>10 queries with most results</th><th>10 queries with least results</th></tr>"
        print >>out, "<tr><td>"
        print >>out, "<ul>"
        oldqueries.sort(key=lambda x:x["count"], reverse=True)
        for q in oldqueries[:10]:
            print_query(q)
        print >>out, "</ul>"
        print >>out, "</td><td>"
        print >>out, "<ul>"
        nonempty = [x for x in oldqueries if x["count"] > 0]
        nonempty.sort(key=lambda x:x["count"])
        for q in nonempty[:10]:
            print_query(q)
        print >>out, "</ul>"
        print >>out, "</td></tr>"
        print >>out, "</table>"
    print >>out, '''
</body>
</html>'''
    return out.getvalue()
@route("/query")
@route("/query/")
@post("/query")
@post("/query/")
def query():
    query = bottle.request.POST.get("query", bottle.request.GET.get("query", ""))
    enquire = xapian.Enquire(db)
    enquire.set_query(make_query(query))
    count = 40
    matches = enquire.get_mset(0, count)
    estimated = matches.get_matches_estimated()
    total = db.get_doccount()
    out = StringIO()
    print >>out, '''
<html>
<head><title>Results</title></head>
<body>
<h1>Results for "<b>%s</b>"</h1>
''' % query
    if estimated == 0:
        print >>out, "No results found."
    else:
        # Give as results the first 30 documents; also use them as the key
        # ones to use to compute relevant terms
        rset = xapian.RSet()
        for m in enquire.get_mset(0, 30):
            rset.add_document(m.document.get_docid())
        # Compute the tag cloud
        class NonTagFilter(xapian.ExpandDecider):
            def __call__(self, term):
                return not term[0].isupper() and not term[0].isdigit()
        cloud = []
        maxscore = None
        for res in enquire.get_eset(40, rset, NonTagFilter()):
            # Normalise the score in the interval [0, 1]
            weight = math.log(res.weight)
            if maxscore == None: maxscore = weight
            tag = res.term
            cloud.append([tag, float(weight) / maxscore])
        max_weight = cloud[0][1]
        min_weight = cloud[-1][1]
        cloud.sort(key=lambda x:x[0])
        def mklink(query, term):
            return "/query?query=%s" % urllib.quote_plus(query + " and " + term)
        print >>out, "<h2>Tag cloud</h2>"
        print >>out, "<blockquote>"
        for term, weight in cloud:
            size = 100 + 100.0 * (weight - min_weight) / (max_weight - min_weight)
            print >>out, "<a href='%s' style='font-size:%d%%; color:brown;'>%s</a>" % (mklink(query, term), size, term)
        print >>out, "</blockquote>"
        print >>out, "<h2>Results</h2>"
        print >>out, "<p><a href='/'>Search again</a></p>"
        print >>out, "<p>%d results over %d documents, %.2f%%</p>" % (estimated, total, estimated * 100.0 / total)
        print >>out, "<p>%d/%d results</p>" % (matches.size(), estimated)
        print >>out, "<ul>"
        for m in matches:
            rec = data[m.document.get_data()]
            print >>out, "<li><a href='/item/%s'>%s</a></li>" % (rec["id"], rec["text"])
        print >>out, "</ul>"
        fd = open(QUERYLOG, "a")
        qinfo = dict(q=query, count=estimated)
        pickle.dump(qinfo, fd)
        fd.close()
    print >>out, '''
<a href="/">Search again</a>

</body>
</html>'''
    return out.getvalue()
@route("/item/:id")
@route("/item/:id/")
def show(id):
    rec = data[id]
    out = StringIO()
    print >>out, '''
<html>
<head><title>Result %s</title></head>
<body>
<h1>Raw JSON record for twit %s</h1>
<pre>''' % (rec["id"], rec["id"])
    print >>out, simplejson.dumps(rec, indent=" ")
    print >>out, '''
</pre>
</body>
</html>'''
    return out.getvalue()
@route("/catinfo/:name")
@route("/catinfo/:name/")
def catinfo(name):
    prefix = prefixes[name]
    out = StringIO()
    print >>out, '''
<html>
<head><title>Values for %s</title></head>
<body>
''' % name
    terms = [(x.term[len(prefix):], db.get_termfreq(x.term)) for x in db.allterms(prefix)]
    terms.sort(key=lambda x:x[1], reverse=True)
    freq_min = terms[0][1]
    freq_max = terms[-1][1]
    def mklink(name, term):
        return "/query?query=%s" % urllib.quote_plus(name + ":" + term)
    # Build tag cloud
    print >>out, "<h1>Tag cloud</h1>"
    print >>out, "<blockquote>"
    for term, freq in sorted(terms[:20], key=lambda x:x[0]):
        size = 100 + 100.0 * (freq - freq_min) / (freq_max - freq_min)
        print >>out, "<a href='%s' style='font-size:%d%%; color:brown;'>%s</a>" % (mklink(name, term), size, term)
    print >>out, "</blockquote>"
    print >>out, "<h1>All terms</h1>"
    print >>out, "<table>"
    print >>out, "<tr><th>Occurrences</th><th>Name</th></tr>"
    for term, freq in terms:
        print >>out, "<tr><td>%d</td><td><a href='/query?query=%s'>%s</a></td></tr>" % (freq, urllib.quote_plus(name + ":" + term), term)
    print >>out, "</table>"
    print >>out, '''
</body>
</html>'''
    return out.getvalue()
# Change here for bind host and port
bottle.run(host="0.0.0.0", port=8024)

...and then we presented our work and ended up winning the contest. This was the story of how we wrote this set of award winning code.

One of the nice things that I've been involved with since starting to work at ARM in Cambridge is setting up newer, faster machines to help with the armel port. We have six machines hosted in the machine room here now:

abel is a porter box
arnold, alain, alwyn and antheil are running buildd software
arne is a hot spare (currently on my desk, but ready to go back in the rack any minute now)

All of these machines are Marvell DB-78x00-BP development boards, each configured with a 1GHz Feroceon processor (ARM v5t), 1.5GB of RAM and a 250GB drive attached via SATA. They're nice machines, reasonably powerful yet (as with many ARM-based machines) they draw very very little electrical power even when working hard. These very boards were used for a while by the folks at Canonical to help build the Ubuntu armel port, but now we've got them. In terms of configuration, these machines are not quite fully supported in Debian yet, though. The kernels we're using are locally-built, based on the Debian linux-source-2.6.32 package but with a .config (marvell.config) that's tweaked slightly to add the support for these boards. There aren't any source changes needed, so I'm hoping to get support added directly in Debian, either as a new kernel flavour or (preferred) as a patch to an existing flavour. I've had conflicting advice about whether the latter is possible, so I'm going to have to experiment and find out for myself. UPDATE 2010-09-28: I've tested, and it seems that the boards will need a new flavour after all, as the config is incompatible with the closest other config (kirkwood). Ah well... I had no end of trouble trying to get make-kpkg do the right thing, so on advice from Ben I built the kernel using "make deb-pkg", a standard target in the Linux kernel's build system: fakeroot make -j2 deb-pkg DEBEMAIL=93sam@debian.org DEBFULLNAME="Steve McIntyre" KDEB_PKGVERSION=buildd23 Annoyingly, that wouldn't work when cross-compiling either so I had to build the kernel natively. To make the resulting kernel image package install properly (and, just as importantly, allow for future easy upgrades for the DSA folks), I also needed the following tweaks to the Debian system:

depmod:
- Need to make sure that depmod is run so the new kernel can find and load modules at boot. Added trivial script in /etc/kernel/postinst.d/depmod to do this.
initramfs-tools:
- Needed to copy the file /etc/kernel/postinst.d/initramfs-tools into place from my amd64 machine; I'm guessing this would be there automatically on a new-enough version of initramfs-tools on the armel machines, but we're still using Lenny as a base system for now even if I'm using a Squeeze-based kernel.
flash-kernel:
- #594878: support for these boards. Hector did most of the work, and I've worked on it a little more.
- #550584: kernel postinst hook script (/etc/kernel/postinst.d/zz-flash-kernel) to create uImage and uInitrd files from the kernel zImage and the initramfs.

Finally, I've tweaked the uboot config on the machines to use the uImage and uInitrd files that are generated:

Marvell>> setenv IDE ide reset
Marvell>> setenv loadkernel ext2load ide 0:1 0x2000000 /uImage
Marvell>> setenv loadinitramfs ext2load ide 0:1 0x3000000 /uInitrd
Marvell>> setenv bootboth bootm 0x2000000 0x3000000
Marvell>> setenv bootcmd setenv bootargs \$\(bootargs\)\;$(IDE)\;$(loadkernel)\;$(loadinitramfs)\;$(bootboth)
Marvell>> saveenv

And that's it, as far as I can see. I'll now wait for people to tell me what I've got wrong above... :-)

A few days ago I talked about how noisy my new Dell Studio 15 was but I can now report back with the beginnings of a solution to that problem, and it doesn't appear to be ACPI related. The first clue I needed was that if I switch to the proprietary fglrx drivers for the Radeon the fan quite quickly drops off to a much more reasonable level. It seems the fglrx drivers have issues, however, in particular I get big black patches on my screen. This video corruption happens especially in Firefox, but sometimes in other applications as well. They also appear to screw up my suspend/resume, which is probably even more annoying to me. The second clue that I needed was that Radeon power management support has only just made it into recent kernels. Thanks to Michael Kirkland for providing me with both of those clues :-) Looking under /sys/class/drm/ I find a whole bunch of stuff, but in particular there are /sys/class/drm/card0/device/power_method and /sys/class/drm/card0/device/power_profile. Looking through the kernel source code I can see that power_profile can be set to low, mid, high, auto and default, while power_method can be set to either dynpm or profile. Trying out all of these values, it seems I get the quietest result with the profile method and either the low or mid profile. The dynpm method is nearly as good, and I would think it should really be the default for a 'Mobility' chipset. From the detailed benchmarking that Phoronix did I wonder if it shouldn't be the default for everyone. For myself, I see some small 'tearing' artifacts occasionally when running with the low profile. These disappear when I run with the mid profile, and since that seems to have pretty much the same temperature (and noise) results I'll go with that one. Though the laptop often does still make more noise than I would prefer it to, it is no longer annoying everyone in the room. Not unexpectedly this seems to have a huge impact on power use, too. It appears that the laptop should now give me around 4.5 hours when I do everything I can think of to lower the power use, whereas before it was more like 2.5 hours. Now I guess I can get back to hacking on DAViCal...

After a bit of work, I got the sheeva plug working the way I wanted it to. First of all, I grabbed a spare 1G USB flash disk I had laying around and installed Debian squeeze to it by following tbm s instructions here: http://www.cyrius.com/debian/kirkwood/sheevaplug/install.html After debian was installed to the USB disk, I removed it from the plug and used dd on my laptop to create an image of the filesystem. I mounted the filesystem as a loopback device and created a jffs2 partition image from it after doing a bit of minor tweaking. I placed the USB disk back in the plug and booted to it. Using this intermediate filesystem and the mtd-tools package, I wrote the new jffs2 image to the NAND mtd device. I then modified the uboot environment to suit, saved, and now have a working setup. Detailed instructions are below. pre-requisites Connect the plug to your network with the rj-45/cat5 cable. Connect a USB flash drive to the plug. I used a 1G drive, but 512M should be sufficient. Install the tftpd package on a machine on your network. You ll also need screen on the machine that you use to connect to the plug s serial terminal. I ll assume these are the same host. You need the IP address of the tftp server. In this example, we will assume that the IP is 192.168.1.2 and that you will assign 192.168.1.200 to the plug. $ sudo apt-get install tftpd screen Place uImage, uInitrd, and uBoot in the /srv/tftp directory: $ sudo wget -O /srv/tftp/uImage http://www.cyrius.com/tmp/beta1/marvell/sheevaplug/uImage
$ sudo wget -O /srv/tftp/uInitrd http://www.cyrius.com/tmp/beta1/marvell/sheevaplug/uInitrd
$ sudo wget -O /srv/tftp/uBoot http://wp.colliertech.org/cj/wp-content/uploads/2010/09/07/uboot.bin Connecting to the serial console We are now ready to boot the plug. Note that you have to be quick with the screen command. I recommend you type it out and get ready to press the enter key. You have to interrupt the bootloader in order to enter the u-boot console. Attach the power and immediately enter the following command: $ screen -S sheeva /dev/ttyUSB0 cs8,ixoff,115200 If you did this right, you should see a prompt like this: Marvell>> update uboot As per tbm s instructions, you can now update the uboot if needed. Instructions for that are here: http://www.cyrius.com/debian/kirkwood/sheevaplug/uboot-upgrade.html I ll put them inline here for completeness.

Marvell>> setenv serverip 192.168.1.2 # IP of your TFTP server
Marvell>> setenv ipaddr 192.168.1.200 # IP of the plug
Marvell>> bubt uboot.bin

be sure to answer n to the question about env parameters. Now reset the device. Don t forget to interrupt the bootloader so we can get back to the prompt:

Marvell>> reset

Let the device know that you will be running a mainline kernel and set the arcNumber:

Marvell>> setenv mainlineLinux yes
Marvell>> setenv arcNumber 2097
Marvell>> saveenv
Marvell>> reset

Install Debian to the USB disk Interrupt the boot process once more. We are now ready to run the debian installer. I ll leave the details as an exercise for the reader. The following should put you into the installer, which you are probably quite familiar with by now. NB: I will assume that you will use a single ext3 partition for the entire system. This will make it easier to build a jffs2 image out of the resultant partition on the USB disk.

Marvell>> setenv serverip 192.168.1.2
Marvell>> setenv ipaddr 192.168.1.147
Marvell>> tftpboot 0x01100000 uInitrd
Marvell>> tftpboot 0x00800000 uImage
Marvell>> setenv bootargs console=ttyS0,115200n8 base-installer/initramfs-tools/driver-policy=most
Marvell>> bootm 0x00800000 0x01100000

build a jffs2 image Once the installation has completed, power down the plug and remove the USB disk. Put the USB disk in the machine you used to get the console on the plug and note what device the kernel assigns to it. In my case, it was given sdc, so debian is installed to /dev/sdc1. I will use these values for this example. If the filesystem was automatically mounted, unmount it. Create a disk image of the partition and mount it as a loopback device.

$ sudo umount /dev/sdc1
$ dd if=/dev/sdc1 of=/tmp/debian.img
$ mkdir /tmp/mnt
$ sudo mount -o loop /tmp/debian.img /tmp/mnt

You can now remove the USB disk and return it to the plug. Modify the /etc/fstab file. The root filesystem will be /dev/mtdblock2 and of fs type jffs2. My fstab file looks like the following:

proc            /proc           proc    defaults        0       0
/dev/mtdblock2 /               jffs2    errors=remount-ro 0       1

We can now create a jffs2 image from the mounted and altered fresh install:

$ sudo mkfs.jffs2 -l \
  -e 0x20000 \
  -X zlib \
  --eraseblock=128KiB \
  --pad \
  --output=/tmp/rootfs.jffs2 \
  --compression-mode=priority \
  -n \
  --squash \
  -r /tmp/mnt

write jffs2 to nand Now that we have an image to flash to the NAND, let s boot off of the USB disk. It should now be attached to the plug. Power on the plug and use screen to get a console. Interrupt the bootloader and enter the following commands:

Marvell>> setenv bootargs_console console=ttyS0,115200
Marvell>> setenv bootcmd_usb 'usb start; ext2load usb 0:1 0x01100000 /boot/uInitrd; ext2load usb 0:1 0x00800000 /boot/uImage'
Marvell>> setenv bootcmd 'setenv bootargs $(bootargs_console); run bootcmd_usb; bootm 0x00800000 0x01100000'
Marvell>> run bootcmd

Log in as root with the credentials you configured during the install. You will need to install a few packages in order to complete the nand flash. You can then ssh to the host on which you created the jffs2 image and cat it to stdout, piping this to nandwrite:

$ sudo apt-get install mtd-utils
$ ssh user@192.168.1.2 cat /tmp/rootfs.jffs2   sudo nandwrite /dev/mtd2 -p -

You now have a jffs2 image on the nand. Configure u-boot One more reboot to set u-boot s environment, and you will be done. Power down and remove the USB disk. Power on and get the serial console using screen. Break into the bootloader and enter the following commands:

Marvell>> setenv bootargs_console console=ttyS0,115200
Marvell>> setenv mtdpartitions mtdparts=orion_mtd:0x400000@0x100000(uImage),0x1fb00000@0x500000(rootfs)
Marvell>> setenv bootargs_root root=/dev/mtdblock2 rw rootfstype=jffs2
Marvell>> setenv bootcmd 'setenv bootargs $(bootargs_console) $(mtdpartitions) $(bootargs_root); nand read.e 0x00800000 0x00100000 0x00400000; bootm 0x00800000'
Marvell>> saveenv
Marvell>> reset

You should now be good to go.

Having invested in some introspection into my reading habits, I made up my mind to dial down my consumption of bite-sized nuggets of online information, and finish a few books. That s where my bottleneck has been for the past year or so. Not in selecting books, not in acquiring books, and not in starting books either. I identify promising books, I buy them, I start reading them, and at some point, I put them down and never pick them back up again. Until now. Over the weekend, I finished two books. I started reading both in 2009, and they each required my sustained attention for a period measured in hours in order to finish them. Taking a tip from Dustin, I decided to try alternating between fiction and non-fiction. Jitterbug Perfume by Tom Robbins This was the first book I had read by Tom Robbins, and I am in no hurry to read any more. It certainly wasn t without merit: its themes were clever and artfully interwoven, and the prose elicited a silent chuckle now and again. It was mainly the characters which failed to earn my devotion. They spoke and behaved in ways I found awkward at best, and problematic at worst. Race, gender, sexuality and culture each endured some abuse on the wrong end of a pervasive white male heteronormative American gaze. I really wanted to like Priscilla, who showed early promise as a smart, self-reliant individual, whose haplessness was balanced by a strong will and sense of adventure. Unfortunately, by the later chapters, she was revealed as yet another vacant vessel yearning to be filled by a man. She s even the steward of a symbolic, nearly empty perfume bottle throughout the book. Yes, really. Managing Humans by Michael Lopp Of the books I ve read on management, this one is perhaps the most outrageously reductionist. Many management books are like this, to a degree. They take the impossibly complex problem domain of getting people to work together, break it down into manageable problems with tidy labels, and prescribe methods for solving them (which are hopefully appropriate for at least some of the reader s circumstances). Managing Humans takes this approach to a new level, drawing neat boxes around such gestalts as companies, roles, teams and people, and assigning them Proper Nouns. Many of these bear a similarity to concepts which have been defined, used and tested elsewhere, such as psychological types, but the text makes no effort to link them to his own. Despite being a self-described collection of tales , it s structured like a textbook, ostensibly imparting nuggets of managerial wisdom acquired through lessons learned in the Real World (so pay attention!). However, as far as I can tell, the author s experience is limited to a string of companies of a very specific type: Silicon Valley software startups in the dot com era. Lopp (also known as Rands) does have substantial insight into this problem domain, though, and does an entertaining job of illustrating the patterns which have worked for him. If you can disregard the oracular tone, grit your teeth through the gender stereotyping, and add an implicit preface that this is (sometimes highly) context-sensitive advice, this book can be appreciated for what it actually is: a coherent, witty and thorough exposition of how one particular manager does their job. I got some good ideas out of this book, and would recommend it to someone working in certain circumstances, but as with Robbins, I m not planning to track down further work by the same author.

The eSATA SheevaPlug is supported by the Debian installer and by Debian now. I've updated the install guide accordingly. If you're already running Debian on your eSATA SheevaPlug but you installed as a regular SheevaPlug to USB or SD and you'd like to use the eSATA, then make sure you're the latest kernel from Debian squeeze:

apt-get update
apt-get dist-upgrade
flash-kernel

Reboot and type this in u-boot:

setenv arcNumber 2678
saveenv
reset

Your machine will then be recognized as an eSATA SheevaPlug and eSATA will work. Thanks to John Holland for working on SheevaPlug eSATA support.

Yesterday finally my GuruPlug Server Plus arrived. Took longer than expected, but as it seems Globalscale had some issues with the power supplies and they were replaced before shipping the GuruPlugs.

The GuruPlug Basically the GuruPlugs are an enhanced version of the well known SheevaPlugs, the biggest difference is probably the need for an external JTAG/UART<>RS232 board to access the serial console. Good thing is that the board comes with a normal JTAG connector and an additional RS232 connector and 2.5V DC power outlet, so it will be useful for other devices, too. Globalscale could have chosen different connectors with less wiggly cables, though. As I was not able to find a useful howto about installing Debian on the Guruplug, I've written down what I did to install Debian unstable on a micro SD card using the Debian installer for the GuruPlug. I did not have a look who modified the Debian installer to work on the plug, but thanks for that! The instructions below are based on Martin Michlmayer's awesome SheevaPlug documentation, the hints from oinkzwurgl.org/guruplug and various forum posts in the plugforum.

Preparations Please note that I'm not resposible for whatever you're doing with your plug. If you follow this tutorial and end up with a brick, it is your fault, not mine. To install the GuruPlug you need the JTAG Board. Connect the UART port to the GuruPlug and the JTAG board to your computer, it should show up as FTDI (thanks for using good chips!) USB<>Serial converter. Serial port settings are 115200, 8-N-1, no hw/sw flow control. The other thing you should prepare is a working tftpd, I'm using aftpd.

apt-get install aftpd

Recent versions share files from /srv/tftp/, in case you're running Lenny /var/lib/tftpboot/ should be the place to drop your files. When everything is connected properly the boot process should show up in minicom, make sure to press some key to enter uBoot. The first thing you should do is to save the original uBoot environment in case you want to restore the factory settings later. Run

printenv

and save the output somewhere.

Upgrading uBoot Unfortunately the uBoot version on the GuruPlug is pretty old and seems to have some issues in booting from USB devices, so the first thing you should do is to upgrade it. You might want to investigate if there is even a better, more recent uBoot version available somewhere, or build one on your own, but I didn't bother and took the uBoot.guruplug.bin from here. The main issue with that is that booting from USB still seems to be buggy (even for FAT partitions) and that ext2load is still not supported. Otherwise it works well :-). I'm mainly following Martin Michlmayer's tutorial again. Download the uBoot.guruplug.bin and drop it into the tftpd directory. Make sure you always set the plug's IP address (ipaddr) and your server's IP address (serverip) properly. I'll use 192.168.121.253 for the plug and 192.168.121.2 for the server in all examples, make sure to change that for your own needs. Stop if something goes wrong, especially when the tftp download failed.

setenv ipaddr 192.168.121.253
setenv serverip 192.168.121.2
tftp 0x6400000 uBoot.guruplug.bin
nand erase 0x00000000 0x0100000
nand write 0x6400000 0x0000000 0x80000
reset

Enter uBoot again after the reset.

Preparing the installer Download uImage and uInitrd to your tftpd directory. Although I've heard that setting mainlineLinux/arcNumber in the uBoot environment is not necessary anymore for very recent kernel, lets set them to make sure the Debian kernel works:

setenv mainlineLinux yes
setenv arcNumber 2097
saveenv
reset

Again, enter uBoot after the reset.

Running the installer Run the following in the uBoot console:

setenv ipaddr 192.168.121.253
setenv serverip 192.168.121.2
tftpboot 0x01100000 uInitrd
tftpboot 0x00800000 uImage
setenv bootargs console=ttyS0,115200n8 base-installer/initramfs-tools/driver-policy=most
bootm 0x00800000 0x01100000

You should see the installer starting now. You might want to follow the following hints:

Before configuring the network, go back and set the debconf priority to low, Then continue. While chosing the Debian mirror, chose sid as the Debian version to install. If you don't have sid as choice, use a different mirror. The kernel in testing does not boot on the GuruPlug, you need 2.6.32-13 from sid.
You might want to load the 'network console' installer component and continue via ssh. Makes things fater and colourful.
Suggested partitioning: I've installed Debian to an 8GB micro SDcard. The SDcard reader is connected via USB and shows up as /dev/sdb (/dev/sda should be the internal NAND and not shown by the installer). I've used 150MB ext2 for /boot and the rest of the space for /, using ext4. You might want to use the noatime option on both filesystems to avoid unnecessary write access to the SDcard. You might choose to add a swap partition, but SDcards are so slow, so I've skipped that.

When you continue the installation, you will hit the following problem:

The installer might fail on 'Make the system bootable' (not sure if it does for you, it did so with the kernel from testing).
The uBoot is not able to boot from your /boot anyway. USB support is buggy and ext2load missing.

We'll work around these issues by writing the kernel and initrd into the plug's NAND. To do so, enter a shell in the installer and chroot into the install target. We'll then create the necessary uImage.bin and uInitrd and scp them to our tftpd directory:

chroot /target /bin/bash
cd /boot
mkimage -n vmlinuz -A arm -O linux -T kernel -C none -a 0x00008000 -e 0x00008000 -d /boot/vmlinuz uImage.bin
mkimage -n 'vmlinuz initrd' -A arm -O linux -T ramdisk -C gzip -d /boot/initrd.img uInitrd
scp uI* root@192.168.121.2:/srv/tftp

Now leave the shell, finish the installation and reboot, enter uBoot again.

Make the plug bootable To write kernel and initrd to the NAND memory, we have to transfer it via tftp first, then erase the NAND area we want to write to and then write it to the NAND. The values I've chosen here should be fine for the current Debian kernel, but you might need to change the necessary size for the initrd. To do so have a look at the output while transferring the initrd - the transferred bytes are displayed. They have to fit into the amount of bytes you write (the last option to nand write.e).

setenv ipaddr 192.168.121.253
setenv serverip 192.168.121.2
tftp 0x6400000 uImage.bin
nand erase 0x100000 0x400000
nand write.e 0x6400000 0x100000 0x400000
tftp 0x6400000 uInitrd
nand erase 0x500000 0x1fb00000
nand write.e 0x6400000 0x500000 0x600000

Now we need to set the necessary boot options. Make sure to change the root device if you've chosen a different layout from that I've suggested above, or if you're not using a SDcard.

setenv bootargs_debian 'console=ttyS0,115200 root=/dev/sdb2'
setenv bootcmd_nand 'nand start; nand read.e 0x00800000 0x100000 0x400000; nand read.e 0x01100000 0x500000 0x600000'
setenv bootcmd 'setenv bootargs $(bootargs_debian); run bootcmd_nand; bootm 0x00800000 0x01100000'
saveenv
run bootcmd

Finish Your GuruPlug should boot your new Debian installation now. Have fun! I'll try to keep the howto updated for changes in uBoot and the installer, but I might not have the time to so quickly. Patches and comments are welcome!.

root@guruplug:~# uname -a
Linux guruplug 2.6.32-5-kirkwood #1 Fri May 21 05:44:29 UTC 2010 armv5tel GNU/Linux
root@guruplug:~# cat /proc/cpuinfo 
Processor   : Feroceon 88FR131 rev 1 (v5l)
BogoMIPS    : 1192.75
Features    : swp half thumb fastmult edsp 
CPU implementer : 0x56
CPU architecture: 5TE
CPU variant : 0x2
CPU part    : 0x131
CPU revision    : 1
Hardware    : Marvell GuruPlug Reference Board
Revision    : 0000
Serial      : 0000000000000000
root@guruplug:~#

I recently discovered that there are two variants of the recovery mode used on QNAP TS-11x/TS-21x (and possible TS-41x) devices and that one has a different behaviour than what my documentation claims. While this issue should hopefully affect few users (but please take a moment and check if you're affected), it has implications to all Debian users on TS-11x/TS-21x. My install guide originally told users to create backup of some mtd partitions only but from now on you need a copy of all partitions in order to use the recovery mode. Therefore, please take a moment now to create a backup of the remaining partitions:

cat /dev/mtdblock0 > mtd0
cat /dev/mtdblock4 > mtd4
cat /dev/mtdblock5 > mtd5

(You should have copies of mtd1, mtd2 and mtd3 already if you following my guide.) Make sure to copy the files to another machine and add them to your backup.

Search Results: "kirk"

5 October 2013

3 October 2013

6 May 2013

26 January 2013

5 January 2013

11 November 2012

19 May 2012

8 December 2011

24 July 2011

13 June 2011

2 November 2010

21 October 2010

15 October 2010

27 September 2010

8 September 2010

7 September 2010

21 June 2010

13 June 2010

29 May 2010

20 May 2010